Memory Compression

Since replay buffer stores a large number of data set, memory efficiency is one of the most important point.

In cpprb, there are two optional functionalities named next_of and stack_compress, which you can turn on manually when constructing replay buffer.

next_of and stack_compress can be used together, but currently none of them are compatible with N-step replay buffer.

These memory compressions rely on the internal memory alignment, so that these functionalities cannot be used in situations where sequential steps are not stored sequentially (e.g. distributed reinforcement learning).

1 next_of

1.1 Overview

In reinforcement learning, usually a set of observations before and after a certain action are used for training, so that you save the set in your replay buffer together. Naively speaking, all observations are stored twice.

As you know, replay buffer is a ring buffer and the next value should be stored at the next index, except for the newest edge.

If you specify next_of argument (whose type is str or array like of str), the “next value” of specified values are also created in the replay buffer automatically and they share the memory location.

The name of the next value adds prefix next_ to the original name (e.g. next_obs for obs, next_rew for rew, and so on).

This functionality has small penalties for manipulating sampled index and checking the cache for the newest index. (As far as I know, this penalty is not significant, and you might not notice.)

1.2 Example Usage

import numpy as np
from cpprb import ReplayBuffer

buffer_size = 256

rb = ReplayBuffer(buffer_size,{"obs": {"shape": (84,84)},
                               "act": {"shape": 3},
                               "rew": {},
                               "done": {}}, # You must not specify "next_obs" nor "next_rew".


1.3 Notes

cpprb does not check the consistence of i-th next_foo and (i+1)-th foo. This is user responsibility.

Since next_foo is automatically generated, you must not specify it in the constructor manually.

1.4 Technical Detail

Internally, next_foo is not stored into a ring buffer, but into its chache. (So still raising error if you don’t pass them to add.)

When sampling the next_foo, indices (which is numpy.ndarray) are shifted (and wraparounded if necessary), then are checked whether they are on the newest edge of the ring buffer. If the indices are on the edge, the cached one is extracted.

2 stack_compress

2.1 Overview

stack_compress is designed for compressing stacked (or sliding windowed) observation. A famous use case is Atari video game, where 4 frames of display windows are treated as a single observation and the next observation is the one slided by only 1 frame (e.g. 1,2,3,4-frames, 2,3,4,5-frames, 3,4,5,6-frames, …). For this example, a straight forward approach stores all the frames 4 times.

cpprb with stack_compress does not store duplicated frames in stacked observation (except for the end edge of the internal ring buffer) by utilizing numpy sliding trick.

You can specify stack_compress parameter, whose type is str or array like of str, at constructor.

2.2 Sample Usage

The following sample code stores 4-stacked frames of 16x16 data as a single observation.

import numpy as np
from cpprb import ReplayBuffer

rb = ReplayBuffer(32,{"obs":{"shape": (16,16,4)}, 'rew': {}, 'done': {}},
                  next_of = "obs", stack_compress = "obs")


2.3 Notes

In order to make compatible with OpenAI gym, the last dimension is considered as stack dimension (which is not fit to C array memory order).

For the sake of performance, cpprb does not check the overlapped data are truly identical, but simply overwrites with new data. Users must not specify stack_compress for non-stacked data.

2.4 Technical Detail

Technically speaking numpy.ndarray (and other data type supporting buffer protocol) has properties of item data type, the number of dimensions, length of each dimension, memory step size of each dimension, and so on. Usually, no data should overlap memory address, however, stack_compress intentionally overlaps the memory addresses in the stacked dimension.